-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use StandardCharsets instead of charset names #10077
Conversation
And do not try to catch UnsupportedEncodingException
8c23e66
to
641bb32
Compare
src/main/java/edu/harvard/iq/dataverse/dataaccess/TabularSubsetGenerator.java
Outdated
Show resolved
Hide resolved
@@ -29,7 +29,7 @@ | |||
import java.io.UnsupportedEncodingException; | |||
import java.nio.ByteBuffer; | |||
import java.nio.ByteOrder; | |||
|
|||
import java.nio.charset.StandardCharsets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to also fix the other lines that use string representations of char sets?
282: private String defaultCharSet = "ISO-8859-1";
private String defaultCharSet = StandardCharsets.ISO_8859_1.name();
700: defaultCharSet = "UTF-8";
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at where defaultCharSet
is read (only in new String(.., defaultCharSet)
, it would make sense to make it an actual Charset
. How does that sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using StandardCharsets instead of the string name is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know exactly how this code interacts with the UI and APIs, which seem to allow users to choose from a list of charsets for certain files. Can we make this a separate issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
...ain/java/edu/harvard/iq/dataverse/ingest/tabulardata/impl/plugins/rdata/RDATAFileReader.java
Show resolved
Hide resolved
Closing due to inactivity. Please reopen if this issue is still relevant. |
Co-authored-by: Steven Winship <39765413+stevenwinship@users.noreply.github.com>
I didn't have time earlier; thanks for reviewing. Could you reopen this PR? |
Could you take a look at this failing test? testDatasetThumbnail – edu.harvard.iq.dataverse.api.SearchIT |
Could you show me the output of the failed test? |
Error 1 expectation failed. java.lang.AssertionError: {
}
}
}
} },
"items": [
{
"name": "Darwin's Finches",
"type": "dataset",
"url": "https://doi.org/10.5072/FK2/X3GTB1",
"global_id": "doi:10.5072/FK2/X3GTB1",
"description": "Darwin's finches (also known as the Galápagos finches) are a group of about fifteen species of passerine birds.",
"publisher": "dveade0eee",
"citationHtml": "Finch, Fiona, 2024, \"Darwin's Finches\", <a href=\"https://doi.org/10.5072/FK2/X3GTB1\" target=\"_blank\">https://doi.org/10.5072/FK2/X3GTB1</a>, Root, DRAFT VERSION",
"identifier_of_dataverse": "dveade0eee",
"name_of_dataverse": "dveade0eee",
"citation": "Finch, Fiona, 2024, \"Darwin's Finches\", https://doi.org/10.5072/FK2/X3GTB1, Root, DRAFT VERSION",
"storageIdentifier": "file://10.5072/FK2/X3GTB1",
"subjects": [
"Medicine, Health and Life Sciences"
],
"fileCount": 2,
"versionId": 92,
"versionState": "DRAFT",
"createdAt": "2024-07-09T15:00:22Z",
"updatedAt": "2024-07-09T15:00:32Z",
"contacts": [
{
"name": "Finch, Fiona",
"affiliation": ""
}
],
"authors": [
"Finch, Fiona"
]
}
],
"count_in_response": 1
}
} Jul 09, 2024 3:00:22 PM edu.harvard.iq.dataverse.api.SearchIT testDatasetThumbnail |
FWIW: The test fails at dataverse/src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java Lines 852 to 853 in 5ba74e8
|
src/main/java/edu/harvard/iq/dataverse/dataset/DatasetUtil.java
Outdated
Show resolved
Hide resolved
I made a copy-paste error when rewriting code with try-with-resources. Also: - fix duplication in variable assignment in SearchIT - remove unused imports
Thanks for the output, @stevenwinship and that pointer, @qqmyers – it's a big verbose test (perhaps it can be broken up into smaller ones at some point?). I fixed my mistake of creating a Can you see if the test passes? |
Seems to still have the same test failure: ```
|
After some investigation, local testing and adjusting log messages and log levels, I now understand less of what
Looking closer, it appears that this copies/transfers from and to itself. I did change that part of the code to use try-with-resources, so I will revert that and see what happens. |
While I don't yet understand this transfer, I suddenly dawned on me that the operations needed to be run in the same code block, because of the try-with-resources construct that closes things at the end of the block. The test now passes on my machine (in Docker) :) |
The supported version takes a Charset argument
Hey @bencomp. sorry, can I bother you to update your branch from dev again? thanks! |
What this PR does / why we need it:
This fixes a (potential) resource leak as described in #10053 and it replaces all uses of
"utf-8"
,"us-ascii"
and"iso-8859-1"
(mostly in upper case though) to get a charset withStandardCharsets.X
.The constants in
StandardCharsets
are supported on all Java implementations (since its introduction), so using them inString.getBytes(..)
will not throw anUnsupportedEncodingException
. This may simplify the code.Which issue(s) this PR closes:
Closes #10076
Closes #10053
Special notes for your reviewer:
Apologies for mixing these two issues in one PR.
Suggestions on how to test this:
See that the test suites pass.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No
Is there a release notes update needed for this change?:
No
Additional documentation:
Also linked from #10076, Sonarcloud's explanation for using StandardCharsets may be helpful.